A distribution that shows the likelihood of the possible results of a experiment with repeated trials in which each trial can result in a specified number of outcomes that is greater than two. A multinomial distribution could show the results of tossing a dice, because a dice can land on one of six possible values. By contrast, the results of a coin toss would be shown using a binomial distribution because there are only two possible results of each toss, heads or tails.
Two additional key characteristics of a multinomial distribution are that the trials it illustrates must be independent (e.g., in the dice experiment, rolling a five does not have any impact on the number that will be rolled next) and the probability of each possible result must be constant (e.g., on each roll, there is a one in six chance of any number on the die coming up).
one.dice <- function(){
dice <- sample(1:6, size = 1, replace = TRUE)
return(dice)
}
one.dice() #what is happening here?? try this several times.
## [1] 5
#what is hapening here?
par(mfrow=c(2,2))
for (i in 1:4){
sims <- replicate(100, one.dice())
table(sims)
table(sims)/length(sims)
plot(table(sims), xlab = 'Event', ylab = 'Frequency')
}
#what is hapening here?
par(mfrow=c(2,2))
for (i in 1:4){
sims <- replicate(10000, one.dice())
table(sims)
table(sims)/length(sims)
plot(table(sims), xlab = 'Event', ylab = 'Frequency')
}
From the class example
Let’s say that Molly, Ryan and Mr.Bob are buying beer(x1) ,bread(x2) and coke(x3) with probabilities (3/5,1/5,1/5).
What is the probability that only 1 of them will buy beer, 2 of them will buy Bread , none will buy coke ? Compare the result with theoratical probability.
Do a simulation for this scenario and plot the marginal distribution of x1.
Helpful links to answer this question:
-> Contour plot also gives the densities.
https://blog.revolutionanalytics.com/2016/02/multivariate_data_with_r.html
-> Then we have these ellipses; the circular symmetric version of complex normal distribution. https://en.wikipedia.org/wiki/Elliptical_distribution
ellipse: https://en.wikipedia.org/wiki/Ellipse
-> http://cs229.stanford.edu/section/gaussians.pdf
This tells you how when correlation coefficient increases the distribution spread and how the ellipses look like. -> https://online.stat.psu.edu/stat505/book/export/html/636
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.3 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.1
## Warning: package 'readr' was built under R version 4.1.1
## Warning: package 'stringr' was built under R version 4.1.1
## Warning: package 'forcats' was built under R version 4.1.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(mvtnorm)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:plotly':
##
## select
## The following object is masked from 'package:dplyr':
##
## select
library(ggplot2)
Source: https://data-se.netlify.app/2018/12/13/visualizing-a-multivariate-normal-distribution/
First, let’s define a covariance matrix Σ:
sigma <- matrix(c(4,2,2,3), ncol = 2)
sigma
## [,1] [,2]
## [1,] 4 2
## [2,] 2 3
Then, simulate observations n = n from these covariance matrix; the means need be defined, too. As the rank of our covariance matrix is 2, we need two means:
means <- c(0, 0)
n <- 1000
set.seed(42)
x <- rmvnorm(n = n, mean = means, sigma = sigma)
str(x)
## num [1:1000, 1:2] 2.314 1.053 0.716 2.848 3.839 ...
Let’s make a data frame out of it:
d <- data.frame(x)
names(d)
## [1] "X1" "X2"
d %>%
ggplot(aes(x = X1)) +
geom_density()
p1 <- data_frame(x = -3:3) %>%
ggplot(aes(x = x)) +
stat_function(fun = dnorm, n = n)
## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## Please use `tibble()` instead.
p1
p2 <- ggplot(d, aes(x = X1, y = X2)) +
geom_point(alpha = .5) +
geom_density_2d()
p2
p3 <- ggplot(d, aes(x = X1, y = X2)) +
geom_point(alpha = .5) +
geom_bin2d() +
scale_fill_viridis_c()
p3
(p <- plot_ly(d, x = ~X1, y = ~X2))
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
add_histogram2d(p)
add_histogram2dcontour(p)
dens <- kde2d(d$X1, d$X2)
plot_ly(x = dens$x,
y = dens$y,
z = dens$z) %>% add_surface()
First, compute the density of each (X1, X2) pair.
d$dens <- dmvnorm(x = d)
Now plot a point for each (X1, X2, dens) tuple.
p4 <- plot_ly(d, x = ~ X1, y = ~ X2, z = ~ dens,
marker = list(color = ~ dens,
showscale = TRUE)) %>%
add_markers()
p4